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On characterizing bandwidth requirements of parallel applications 
Anand Sivasubramaniam, Aman Singla, Umakishore Ramachandran, H. Venkateswaran 
May 1995 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 1995 
ACM SIGMETRICS joint international conference on Measurement and 
modeling of computer systems, Volume 23 Issue 1 

Additional Information: full citation , abstract , references , citings , index 
terms 



Full text available: ^ pdfd.15 MB) 



Synthesizing architectural requirements from an application viewpoint can help in making 
important architectural design decisions towards building large scale parallel machines. In 
this paper, we quantify the link bandwidth requirement on a binary hypercube topology for 
a set of five parallel applications. We use an execution-driven simulator called SPASM to 
collect data points for system sizes that are feasible to be simulated. These data points are 
then used in a regression analysis for projec ... 



2 Power sup pl y, voltage, and frequency management: Dynamic voltage and frequency I I 
scaling based on workload decomposition 
Kihwan Choi, Ramakrishna Soma, Massoud Pedram 

August 2004 Proceedings of the 2004 international symposium on Low power 
electronics and design 

Full text available: ^pdf(416.31 KB) Additional Information: full citation , abstract , references , index terms 

This paper presents a technique called "workload decomposition" in which the CPU workload 
is decomposed in two parts: on-chip and off-chip. The on-chip workload signifies the CPU 
clock cycles that are required to execute instructions in the CPU whereas the off-chip 
workload captures the number of external memory access clock cycles that are required to 
perform external memory transactions. When combined with a dynamic voltage and 
frequency scaling (DVFS) technique to minimize the energy consumpt ... 

Keywords: dynamic voltage and frequency scaling, workload decomposition 



3 Power optimization for real-time and media-rich embedded systems: Off-chip latency- I I 
driven dynamic voltage and frequency scaling for an MPEG decoding 
Kihwan Choi, Ramakrishna Soma, Massoud Pedram 

June 2004 Proceedings of the 41st annual conference on Design automation 

Full text available: pdf(365.55 KB) Additional Information: full citation , abstract , references , index terms 
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This paper describes a dynamic voltage and frequency scaling (DVFS) technique for MPEG 
decoding to reduce the energy consumption using the computational workload 
decomposition. This technique decomposes the workload for decoding a frame into on-chip 
and off-chip workloads. The execution time required for the on-chip workload is CPU 
frequency-dependent, whereas the off-chip workload execution time does not change, 
regardless of the CPU frequency, resulting in the maximum energy savings by setting ... 

Keywords: MPEG decoding, low power, voltage and frequency scaling 



A power metric for mobile systems I I 

T. Martin, D. Siewiorek 

August 1996 Proceedings of the 1996 international symposium on Low power 
electronics and design 

Full text available: f fl pdf(49.73 KB) Additional Information: full citation , references , citings , index terms 



5 Synthesizing Realistic Computational Grids I I 
Dong Lu, Peter A. Dinda 

November 2003 Proceedings of the 2003 ACM/IEEE conference on Supercomputing 

Full text available: ^ pdf(224.44 KB) Additional Information: full citation , abstract 

Realistic workloads are essential in evaluating middleware for computational grids. One 
important component is the raw grid itself: a network topology graph annotated with the 
hardware and software available on each node and link. This paper defines our 
requirements for grid generation and presents GridG, our extensible generator. We describe 
GridG in two steps: topology generation and annotation. For topology generation, we have 
both model and mechanism. We extend Tiers, an existing tool from t ... 

6 Fine-Grained Dynamic Voltage and Frequency Scaling for Precise Energy and Hj 
Performance Trade-Off Based on the Ratio of Off-Chip Access to On-Chip 
Computation Times 

Kihwan Choi, Ramakrishna Soma, Massoud Pedram 

February 2004 Proceedings of the conference on Design, automation and test in Europe 
- Volume 1 

Full text available: ^ pdf(757.37 KB) Additional Information: full citation , abstract , citings , index terms 

This paper presents an intra-process dynamic voltage and frequency scaling (DVFS) 
technique targeted toward non real-time applications running on an embedded system 
platform. The key idea is to make use of runtime information about the external memory 
access statistics in order to perform CPU voltage and frequency scaling with the goal of 
minimizing the energy consumption while translucently controlling the performance penalty. 
The proposed DVFS technique relies on dynamically-constructed regres ... 

7 Energy Optimization of Distributed Embedded Processors by Combined Data F] 
Compression and Functional Partitioning 

Jinfeng Liu, Pai H. Chou 

November 2003 Proceedings of the 2003 IEEE/ACM international conference on 
Computer-aided design 

Full text available: ^ pdf(271.86 KB) Additional Information: full citation , abstract , index terms 

Transmitting compressed data can reduce inter-processor communicationtraffic and create 
new opportunities for DVS (dynamicvoltage scaling) in distributed embedded systems. 
However, datacompression alone may not be effective unless coordinated withfunctional 
partitioning. This paper presents a dynamic programmingtechnique that combines 
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compression and functional partitioningto minimize energy on multiple voltage-scalable 
processorsrunning pipelined data-regular applications under performance cons ... 

8 High-Level System Modeling and Architecture Exploration with SvstemC on a Network I I 
SoC: S3C2510 Case Study 

Hye-On Jang, Minsoo Kang, Myeong-jin Lee, Kwanyeob Chae, Kookpyo Lee, Kyuhyun Shim 
February 2004 Proceedings of the conference on Design, automation and test in Europe 
- Volume 1 

Full text available: Qpdfd 04.51 KB) Additional Information: full citation , abstract , citings , index terms 

This paper presents a high-level design methodology applied on a Network SoC using 
SystemC. The topic will emphasize on high-level design approach for intensive architecture 
exploration and verifying cycle accurate SystemC models comparative to real Verilog RTL 
models. Unlike many high-level designs, we started the project with working Verilog RTL 
models in hands, which we later compared our SystemC models to. Moreover, we were able 
to use the on-chip test board performance simulation data to ver ... 

9 A survey of power management techniques in mobile computing operating systems Q 
Gregory F. Welch 

October 1995 ACM SIGOPS Operating Systems Review, Volume 29 issue 4 

Full text available: ^| pdf(763.75 KB) Additional Information: full citation , abstract , citings , index terms 

Many factors have contributed to the birth and continued growth of mobile computing, 
including recent advances in hardware and communications technology. With this new 
paradigm however come new challenges in computer operating systems development. 
These challenges include heretofore relatively unusual items such as frequent network 
disconnections, communications bandwidth limitations, resource restrictions, and power 
limitations. It is the last of these challenges that we shall explore in this p ... 

10 Adaptive voltage scaling: Memory-aware energy-optimal frequency assignment for I I 
dynamic supply voltage scaling . 

Youngjin Cho, Naehyuck Chang 

August 2004 Proceedings of the 2004 international symposium on Low power 
electronics and design 

Full text available: *^ pdf(1 58.76 KB) Additional Information: full citation , abstract , references , index terms 

Dynamic supply voltage scaling (DVS) is one of the best ways to reduce the energy 
consumption of a device when there is a super-linear relationship between energy and 
supply voltage, and a pseudo-linear relationship between delay and supply voltage. 
However, most DVS schemes scale the clock frequency of the supply-voltage-clock-scalable 
(SVCS) CPU only and do not address the energy consumption of the memory. The memory 
is generally non-supply-voltage-scalable (NSVS), but its energy consumption i ... 

Keywords: SDRAM, low power, memory system 

11 Bandwidth: System capability effects on algorithms for network bandwidth I I 
measurement 

Guojun Jin, Brian L. Tierney 

October 2003 Proceedings of the 3rd ACM SIGCOMM conference on Internet 
measurement 

_ !• , . -, ui « ^t/ncA i/d\ Additional Information: full citation , abstract , references , citings , index 

Full text available: TO pdf(254.09 KB) 

^ terms 

A large number of tools that attempt to estimate network capacity and available bandwidth 
use algorithms that are based on measuring packet inter-arrival time. However in recent 
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years network bandwidth has become faster than system input/output (I/O) bandwidth. 
This means that it is getting harder and harder to estimate capacity and available 
bandwidth using these techniques. This paper examines the current bandwidth 
measurement and estimation algorithms, and presents an analysis of how these al ... 

Keywords: algorithm, bandwidth, design, estimation, measure, network, performance, 
system capability 



12 Frame-based dynamic voltage and frequency scaling for a MPEG decoder I I 
Kihwan Choi, Karthik Dantu, Wei-Chung Cheng, Massoud Pedram 

November 2002 Proceedings of the 2002 IEEE/ACM international conference on 
Computer-aided design 

Full text available: f Ppdf(311.84 KB) Additjonal Information: full citation , abstract , references , citings , index 

! terms 

This paper describes a dynamic voltage and frequency scaling (DVFS) technique for MPEG 
decoding to reduce the energy consumption while maintaining a quality of servic(QoS) 
constraint. The computational workload for an incoming frame is predicted using a frame- 
based history so that the processor voltage and frequency can be scaled to provide the 
exact amount of computing power needed to decode the frame. More precisely, the 
required decoding time for each frame is separated into two parts: a fram ... 

13 Session 3: Scalability and resource usage of an OLAP benchmark on clusters of PCs I I 
Michela Taufer, Thomas Strieker, Roger Weber 

August 2002 Proceedings of the fourteenth annual ACM symposium on Parallel 
algorithms and architectures 

Full text available: ^| pdf(219.90 KB) Additional Information: full citation , abstract , references , index terms 

Designing clusters of PCs for distributed databases processing OLAP(On Line Analytical 
Processing) workloads in parallel with good scalability remains a particular challenge as we 
are lacking a deep understanding of the architectural issues around resource usage by 
standard DBMSs on distributed platforms.To address this problem, we present a novel 
performance monitoring framework for filtering and abstracting samples of performance 
data from low level counters into a high level performance pictu ... 

Keywords: cluster of PCs, distributed OLAP processing, parallel databases, performance 
analysis, workload characterization 



14 Effects of clock resolution on the scheduling of interactive and soft real-time processes Q 
Yoav Etsion, Dan Tsafrir, Dror G. Feitelson 

June 2003 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 
2003 ACM SIGMETRICS international conference on Measurement and 
modeling of computer systems, Volume 31 issue l 

Full text available* IS Ddf(51 2 91 KB) Additional Information: full citation , abstract , references , citings , index 
: terms 

It is commonly agreed that scheduling mechanisms in general purpose operating systems 
do not provide adequate support for modern interactive applications, notably multimedia 
applications. The common solution to this problem is to devise specialized scheduling 
mechanisms that take the specific needs of such applications into account. A much simpler 
alternative is to better tune existing systems. In particular, we show that conventional 
scheduling algorithms typically only have little and possibly ... 

Keywords: Linux, clock interrupt rate, interactive process, overhead, scheduling, soft real- 
time, tuning 
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15 High performance software on Intel Pentium Pro processors or Micro-Ops to 
TeraFLOPS 
Bruce Greer, Greg Henry 

November 1997 Proceedings of the 1997 ACM/IEEE conference on Supercomputing 
(CDROM) 

Full text available: ^ pdf(101.17 KB) Additional Information: full citation , abstract , references 

This paper gives a technical discussion of the Intel Pentium® Pro processor and 
optimization strategies used to achieve high performance on scientific applications. We 
demonstrate these optimizations by characterizing matrix multiplication (DGEMM). We give 
insight and a model into our efforts on obtaining the world's first TeraFLOP MP LIN PACK run 
(on the Intel ASCI Option Red Supercomputer), based on Pentium Pro processor 
technology. The importance of this paper is carried by the increasing ... 

Keywords: ASCI Red, BLAS, DGEMM, MP UNPACK, TeraFLOP, optimization 
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16 An adaptive algorithm for low-power streaming multimedia processing 
A. Acquaviva, L. Benini, B. Ricco 

March 2001 Proceedings of the conference on Design, automation and test in Europe 

Full text available: ^] pdf(135.54 KB) Additional Information: full citation , references , citings , index terms 



17 Operating system benchmarking in the wake of Imbench: a case study of the . [H 
performance of NetBSD on the Intel x86 architecture 

Aaron B. Brown, Margo I. Seltzer 

June 1997 ACM SIGMETRICS Performance Evaluation Review , Proceedings of the 
1997 ACM SIGMETRICS international conference on Measurement and 
modeling of computer systems, volume 25 issue l 

Full text available* f 5 ! Ddfd 98 MB) Additional Information: full citation , abstract , references , citings, index 
. lAj p terms 

The Imbench suite of operating system microbenchmarks provides a set of portable 
programs for use in cross-platform comparisons. We have augmented the Imbench suite to 
increase its flexibility and precision, and to improve its methodological and statistical 
operation. This enables the detailed study of interactions between the operating system and 
the hardware architecture. We describe modifications to Imbench, and then use our new 
benchmark suite, hbenchiOS, to exami ... 

18 Effective Software-Based Self-Test Strategies for On-Line Periodic Testing of Q 
Embedded Processors 

Antonis Paschalis, Dimitris Gizopoulos 

February 2004 Proceedings of the conference on Design, automation and test in Europe 
- Volume 1 

Full text available: - Qpdfd 25.28 KB) Additional Information: full citation , abstract , index terms 

Software-based self-test (SBST) strategies are particularly useful for periodic testing of 
deeply embedded processors in low-cost embedded systems that do not require immediate 
detection of errors and cannot afford the well-known hardware, software, or time 
redundancy mechanisms. In this paper, first, we identify the stringent characteristics of an 
SBST test program to be suitable for on-line periodic testing. Then, we introduce a new 
SBST methodology with a newclassification scheme for processor ... 
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19 Reception and posters: Universal synchronization scheme for distributed audio-video £j 
capture on heterogeneous computing platforms 
Rainer Lienhart, Igor Kozintsev, Stefan Wehr 

November 2003 Proceedings of the eleventh ACM international conference on 
Multimedia 

Full text available: ^]pdf(1 76.95 KB) Additional Information: full citation , abstract , references , index terms 

We propose a universal synchronization scheme for distributed audio-video capture on 
heterogeneous computing devices such as laptops, tablets, PDAs, cellular phones, audio 
recorders, and camcorders. These devices typically possess sensors such as microphones 
and possibly cameras. In order to combine them wirelessly into a distributed sensing and 
computing system, it is necessary to provide relative time synchronization among the 
distributed sensors. In this work we propose a setup and an algorit ... 

Keywords: distributed audio-video processing, distributed audio-video synchronization, 
distributed microphone array 



20 Support for real time and OS services in embedded systems: Hardware support for 
real-time operating systems 
Paul Kohout, Brinda Ganesh, Bruce Jacob 

October 2003 Proceedings of the 1st IEEE/ACM/IFIP international conference on 
Hardware/ software codesign and system synthesis 

Full text available- fg) pdf(447.93 KB) Additional Information: full citation , abstract , references , citings, index 
' ^ terms 

The growing complexity of embedded applications and pressure on time-to-market has 
resulted in the increasing use of embedded real-time operating systems. Unfortunately, 
RTOSes can introduce a significant performance degradation. This paper presents the Real- 
Time Task Manager (RTM)--a processor extension that minimizes the performance 
drawbacks associated with RTOSes. The RTM accomplishes this by supporting, in hardware, 
a few of the common RTOS operations that are performance bottlenecks: task ... 

Keywords: RTOS, hardware-software codesign 
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